Simulations of HIV capsid protein dimerization reveal the effect of chemistry and topography on 

the mechanism of hydrophobic protein association 
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Recent work has shown that the hydrophobic protein surfaces in aqueous solution sit near a drying transition. 
The tendency for these surfaces to expel water from their vicinity leads to self assembly of macromolecular 
complexes. In this article we show with a realistic model for a biologically pertinent system how this phe- 
nomenon appears at the molecular level. We focus on the association of the C-terminal domain (CA-C) of 
the human immunodeficiency virus (HIV) capsid protein. By combining all-atom simulations with specialized 
sampling techniques we measure the water density distribution during the approach of two CA-C proteins as a 
function of separation and amino acid sequence in the interfacial region. The simulations demonstrate that CA- 
C protein-protein interactions sit at the edge of a dewetting transition and that this mesoscopic manifestation 
of the underlying liquid-vapor phase transition can be readily manipulated by biology or protein engineering 
to significantly affect association behavior. While the wild type protein remains wet until contact, we identify 
a set of in silico mutations, in which three hydrophilic amino acids are replaced with nonpolar residues, that 
leads to dewetting prior to association. The existence of dewetting depends on the size and relative locations of 
substituted residues separated by nm length scales, indicating long range cooperativity and a sensitivity to sur- 
face topography. These observations identify important details which are missing from descriptions of protein 
association based on buried hydrophobic surface area. 



I. INTRODUCTION 

The hydrophobic effect provides a crucial driving force for 
the self-assembly of proteins into many biological complexes, 
such as viral protein coats or capsids [1, 2], cytoskeletal fil- 
aments [3], and amyloid fibrils (e.g. Ref. [4, 5]). While the 
surfaces of unassembled proteins are wet in solution [6, 7], 
assembly leads to contact surfaces that are dry [8, 9]. It has 
been recently shown that hydrophobic protein surfaces sit near 
a drying transition, enabling them to form soft interfaces with 
water that lead to assembly. This paper shows how this phe- 
nomenon! arises in a realistic model. 

Many models for self-assembly and other biological asso- 
ciation phenomena assume that binding energy is correlated 
to buried hydrophobic surface area (e.g. [10-12]). While this 
generalization has been extremely useful, its accuracy is lim- 
ited because it does not account for effects such as surface 
roughness [13], curvature [14, 15], or long range correlations 
between chemical groups [16]. Corrections arising from these 
effects will be most important for weak protein-protein inter- 
actions, which are ubiquitous in biological systems [17] and 
often essential for the formation of biological assemblages 
(e.g. Ref. [1, 18]). By accounting for the molecularity of 
water, our study provides critical details missing from the sur- 
face area-based calculation which elucidate how the geomet- 
ric arrangement and sizes of different chemical groups within 
a hydrophobic surface determine its interaction. 

Theoretical work [22-24] has shown that hydrophobic as- 
sociation depends on the fact that solvation of a hydrophobic 
particle exceeding 1 nm in diameter leads to an excess of un- 
satisfied hydrogen bonds in the surrounding water, which can 
lead to a state that is close to the liquid-vapor coexistence 
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FIG. 1. The geometry and chemistry of the CA-C dimerization in- 
terface. (A) A space-filling model of one monomer from the CA-C 
dimer (PDBID 1A43 [19]). The residues which play a significant 
role in dimerization [20] are color-coded according to residue- type: 
nonpolar in red, polar in green, basic in blue, and acidic in purple. 
Three residues which play a key role in determining water behavior, 
Ser-178, Glu-180, and Gin- 192 are labeled. (B) A side-view of the 
dimer interface is shown, with Ser-178, Glu-180, and Gln-192 rep- 
resented by space-filling models and the remainder of CA-C dimer 
structure shown in ribbon representation. Images created with VMD 
[21]. 



at ambient conditions. A large, ideal hydrophobic surface 
(which experiences only repulsive excluded volume interac- 
tions with water) pushes the system over a dewetting transition 
and a liquid-vapor interface is formed [22-24]. On the other 
hand, realistic surfaces such as proteins exert van der Waals 
and/or electrostatic interactions that attract the water and thus 
remain wet. The proximity of an underlying dewetting transi- 
tion is then only revealed by fluctuations of water density [25] 
or the response of water density to perturbations [15, 24] such 
as the confinement introduced by the approach of two such 
surfaces. If the surfaces are sufficiently close to a dewetting 
transition, their approach within a critical distance can lead to 
dewetting and subsequent hydrophobic collapse [23, 24, 26- 
34]. However, surfaces of typical proteins found in biological 
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assemblages are geometrically rough and chemically hetero- 
geneous, invariably including hydrophilic groups which lo- 
cally stabilize liquid water [16, 32, 34—36]. It is unclear how 
the principles describing dewetting of idealized surfaces can 
be applied to more complex protein surfaces. 

Recently Patel and coworkers developed specialized sam- 
pling techniques [25, 37] to measure water density fluctu- 
ations in the vicinity of topographically rugged interfaces, 
and used these to demonstrate that the model proteins BphC 
and melittin are close to dewetting transition boundaries [34]. 
Here, we apply these techniques to understand the associa- 
tion of the C-terminal domain of the HIV capsid protein (CA- 
C), as model system with which to understand the assembly 
of macromolecular complexes. The size and composition of 
residues in the interface is typical for protein association inter- 
faces. Furthermore, the CA-C dimerization interface (Fig. 1) 
plays an essential role in HIV capsid assembly [38^43] and 
recent studies suggest that it could be a highly effective tar- 
get for small drug molecules that inhibit assembly [44—46]. 
Isolated CA-C domains dimerize in solution with a dissocia- 
tion constant of about 10 (iM [47] which is similar to that of 
the full length CA (18 [48]), with structures that closely 
correspond to those found in mature HIV capsids[42]. 

We find that, although density fluctuations are enhanced 
with respect to those found in bulk solution, a dewetting tran- 
sition does not occur during association of the wild type pro- 
tein. However, the system is close to a transition as revealed 
by its sensitivity to perturbations in chemistry or topography. 
By performing a systematic series of in silico mutations we 
identify a set of three hydrophilic residues whose simultane- 
ous mutation to nonpolar amino acids of similar size leads to 
dewetting during association. 

The distances between mutated residues demonstrate coop- 
erative effects on nm length scales, while the dependence of 
dewetting on the size of the substituting residues indicates that 
both topography and chemistry control water behavior. These 
results indicate that the proximate dewetting transition can be 
manipulated by small perturbations to the CA-C interfacial 
micro-architecture to effect large changes in association be- 
havior. In contrast to the assumptions underlying traditional 
surface-area-based estimates of protein association behavior, 
our results show that the transition to dewetting does not de- 
pend only on the total buried hydrophobic surface area or the 
mean hydrogen bonding potential of the surface, but does de- 
pend sensitively on the relative locations and sizes of the mu- 
tated residues. 



SYSTEM 

We perform simulations based on the crystal structure 
PDBID 1A43 [19], whose electron density closely fits that 
found at the hexamer-pentamer interface in electron micro- 
graphs of the mature HIV capsid structure [42]. As shown in 
Fig. 1, dimerization occurs via the mutual association of a- 
helix 2 (residues S178-V191). The interface involves approx- 
imately 1200 A 2 of buried solvent accessible area contributed 
by non-polar residues, comprising a hydrophobic 'patch' at 



the center of the contact region which exceeds a nanometer 
in all directions. The CA-C dimerization interface is thus 
representative of capsid protein assembly interfaces and other 
protein-protein association surfaces in terms of structure and 
composition. Based on the changes in binding affinity upon 
mutations of each residue at the dimerization interface, as- 
sociation is primarily driven by hydrophobic interactions but 
attenuated by electrostatic effects [20,48-52]. In particular, 
residues whose mutation to alanine significantly impair asso- 
ciation are mostly nonpolar, whereas mutation of several polar 
or charged residues (Serl78, Glul80, Glnl92) lead to stronger 
association [49]. 
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FIG. 2. The effect of interfacial chemistry on water behavior near 
isolated CA-C monomers. The distribution of water density fluctua- 
tions P v (N) is shown as a function of the number of waters TV in the 
evaluation volume v (pictured on the right in red) in the vicinity of the 
CA-C dimerization interface. Distributions are shown for monomers 
of the wild type protein (WT), and monomers with the indicated sets 
of mutations: E180L, Q192L and S178A/E180L/Q192L (TM), and a 
region of bulk water with the same volume as the evaluation region. 



II. RESULTS 

The CA-C interface on an isolated monomer is wet in 
solution but leads to enhanced water density fluctuations. 

We began by investigating the behavior of water near the 
dimerization interface of an isolated wild type (WT) CA-C 
monomer. Because the protein surface has an irregular shape, 
we employed an extension of the indirect umbrella sampling 
(INDUS) method [34, 37] that allows sampling the probabil- 
ity distribution P v (N) of numbers of water N in an arbitrarily 
shaped volume v [34] as described in the Methods. As shown 
in Fig. 2 the mean number of waters is close to that found in a 
region of bulk water with the same volume, reflecting the fact 
that the surface is wet. However, fluctuations to low densities 
are enhanced near the interface; i.e. P v (N) is enhanced at low 
N in comparison to the distribution for bulk water. This result 
is consistent with observations on self assembled monolayers 
and model proteins BphC and melittin [34]. 

To understand how water near the interfacial surface re- 
sponds to perturbations in the protein sequence, we measured 
P V (N) in the same volume for CA-C monomers with hy- 
drophilic amino acids in the interfacial region mutated to hy- 
drophobic amino acids with similar sizes. The measured dis- 
tributions are shown in Fig. 2 for proteins with one to three 
such mutations. As shown in Fig. 2, the single point muta- 
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tions lead to a small enhancement of low-TV fluctuations, and 
as additional amino acids are mutated the low-TV fluctuations 
are further enhanced. However, the most probable number of 
waters in the region changes only slightly, and in all cases the 
surfaces remain wet. 

Water density fluctuations during association. To inves- 
tigate the behavior of water as two CA-C proteins associate, 
we placed two monomers at different separation distances D, 
and calculated P V (N) in the interfacial region between the 
monomers (Fig. 3A). While low-TV fluctuations and the prob- 
ability of drying, P v (0), increase as the WT monomers ap- 
proach (Fig. 3B), the interface remains wet until the two sur- 
faces come into contact. Specifically, while the most probable 
value of TV decreases with the monomer separation because 
the interfacial volume decreases, the mean density remains es- 
sentially constant and the most probable value of TV remains 
finite. Thus, we conclude that dimerization of wild type CA-C 
proteins does not involve a dewetting transition. 

We next systematically investigated the effect of the inter- 
facial composition on water behavior during association by 
measuring P V (N) distributions at a separation of D = 4 A 
for a series of mutations to amino acids in the interfacial 
region (Fig. 4). We identified no single point or double 
mutation which led to a dewetting transition, but the mu- 
tation Q192L lead to a significant enhancement of low-TV 
fluctuations (Fig. 4A). We also identified a double mutation, 
E180L/S178A, for which about 5 waters tend to vacate the 
interfacial region, but dewetting remains relatively improb- 
able. In contrast, the triple mutation S178A/E180L/Q192L 
(denoted TM) does lead to dewetting, with a dramatic change 
in P V (N) and a most probable value of N = (Fig. 4B). A 
further mutation S178A/E180L/E187L/Q192L (denoted QM) 
leads to more aggressive dewetting, as indicated by a higher 
relative probability of P v (0). 

To identify the critical distance D c for dewetting during ap- 
proach of these mutants, we calculated P v (TV) for a series of 
separation distances D. As shown in Fig. 3C, the surfaces 
remain wet for distances larger than D c « 4.1 A. Near the 
critical value, the P v (TV) distributions are bimodal, with high 
probabilities for the wet (high TV) and dry (low TV) states sepa- 
rated by a low probability intermediate region, suggesting the 
presence of a small barrier to dewetting. A barrier to desolva- 
tion could potentially influence the kinetics of dimerization; 
however, it would be important to determine if such a barrier 
exists when the proteins associate via different approach vec- 
tors. 

Dependence of dewetting on the location of mutations. 

Comparison of P v (TV) from various protein sequences indi- 
cates that mutations are cooperative and that the effect de- 
pends sensitively on the relative location of mutated residues. 
For example, E180L/S178A increases the probability of evac- 
uating 5 waters, but the individual mutations E180L and 
S178A mutations lead to essentially no enhancement of low- 
TV fluctuations in comparison to the association of wild- 
type proteins. This observation can be understood by not- 
ing that the two monomers are rotated by 180 with respect 
to each other in the crystal structure, meaning that El 80 from 
monomer A is juxtaposed with S 178 in monomer B. Thus, the 



overlapping hydrophobic area increases only if both amino 
acids are mutated. On the other hand, the single point mu- 
tation Q192L does enhance low- TV fluctuations because the 
Q192 residues from the two monomers partially overlap in 
the associated structure. Combination of these three muta- 
tions (TM) then leads to a sufficient increase in the contigu- 
ous overlapping hydrophobic area to give rise to dewetting, 
even though the mutated amino acids are separated by up to 
10 A. In contrast, other combinations of 3 mutations which did 
not lead to as significant changes in overlapping hydrophobic 
area did not lead to dewetting. Similarly, Q192L/E180L did 
not enhance low-TV fluctuations of the associating monomers 
relative to those of Q192L. We thus conclude that perturba- 
tions which increase the hydrophobic area of an individual 
monomer have relatively little effect on the propensity for 
dewetting during association unless the contiguous overlap- 
ping hydrophobic area is increased by the mutation. 

Dependence of dewetting on the volume of substituted 
amino acids. The mutations described in the preceding para- 
graph were designed to substitute nonpolar amino acids which 
matched the size and shape of wild type residues as closely 
as possible. However, the contribution of a particular amino 
acid or group of amino acids to association is commonly 
assessed experimentally by substitution to alanine, and the 
most closely related set of mutations which has been studied 
experimentally is S178A/E180A/E187A/Q192A (QM-A) 
[49]. Given the sensitivity of dewetting to the location of 
mutations and other small perturbations, we anticipated 
that the identity of the residue which is substituted for the 
wild type amino acid could also affect water behavior and 
thus we repeated our measurements for triple and quadruple 
mutants in which all residues were substituted with ALA: 
S178A/E180A/Q192A (TM-A) and QM-A. As shown in 
Fig. 5, the behavior of these proteins is markedly different 
from that of TM and QM. Both all- ALA mutation sets exhibit 
P v (TV) which are similar to those of wild type, and dewetting 
is not favorable at any separation distance. This observation 
can be understood by noting that the substituted alanine 
residues contribute little nonpolar surface area and their 
small size in comparison to the wild type residues decreases 
the topographic complementarity of the associating interfaces. 

Hydrogen bonding potential does not imply propensity for 
dewetting. It is important to note that the mutations QM and 
TM do not lead to dewetting simply because they reduce hy- 
drogen bonding between the dimerization interface and sol- 
vent, thereby reducing the enthalpic barrier to releasing wa- 
ters during drying. Both the size-preserving mutants (TM 
and QM) and the all-alanine mutants (TM-A and QM-A) re- 
place the same number of hydrophilic groups at the interface, 
and thus should lead to an equivalent reduction in the num- 
ber of hydrogen bonds between the solvated protein interface 
and water. To verify this supposition, we measured the num- 
ber of hydrogen bonds among interfacial sidechains and wa- 
ter molecules for the cases of isolated monomers (Mon), the 
dimer complex (Dim), and monomers separated by 4A but 
solvated (Dim-Sol), for the sequences WT, TM, and TM-A. 
As shown in Table. I, the number of hydrogen bonds is re- 
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FIG. 3. Water density distributions during association of CA-C and a mutant. (A) The evaluation volume v in red is superposed on the 
CA-C structure with separation D = 5 A. (B) The distribution of water density fluctuations P V (N) is shown for wild type (WT) CA-C 
proteins at separations of D = 4 , 4.5 and 5 A. (C) The distribution of water density fluctuations P v (N) is shown during the approach of two 
S178A/E180L/Q192L (TM) proteins at separations of D — A, 4.1 and 4.5 A. For (B) and (C) representative snapshots are shown in which 
water within 5 Aof both proteins is shown along with one monomer oriented to view the interface in cross-section; the solvent accessible 
surface of the protein is shown, with polar regions colored blue and nonpolar regions colored grey. 




2 4 6 8 10 12 14 



N 




2 4 6 8 10 12 14 



FIG. 4. Interfacial chemistry determines water behavior during as- 
sociation. The distribution of water density fluctuations P V {N) is 
shown for a separation of D = 4 A for (A) wild type (WT) CA-C 
proteins and indicated single mutations, and (B) the indicated sets of 
multiple mutations. 



duced by the same amount within statistical error for both 
TM and TM-A as compared to WT. The similarity in hy- 
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FIG. 5. The distribution of water density fluctuations P V (N) is 
shown for D = 4A from mutants Q192A/E180A/S178A (TM-A) 
and Q192A/E180A/S178A/E187A (QM-A). The smaller volume of 
the substituted ALA residues slightly increases the evaluation vol- 
ume, leading to a larger mean number of waters than observed for 
wild type (Fig. 4A). 



drogen bonding between TM and TM-A, combined with the 
observation that TM undergoes a dewetting during associa- 
tion whereas TM-A does not suggests that hydrogen bonding 
potential and the propensity for dewetting are not uniquely 
related. Since the shape of the dimer interface is better 
perserved in TM proteins, we conclude that the geometric 
complementarity of hydrophobic regions on association sur- 
faces also plays an important role. 
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Configuration Protein Number of h-bonds 



waters waters & sidechains 





WT 


32 


49 


Mon 


TM 


22 


39 




TM-A 


22 


40 




WT 


26 


54 


Dim 


TM 


19 


43 




TM-A 


21 


45 




WT 


33 


52 


Dim-Sol 


TM 


20 


40 




TM-A 


23 


41 



TABLE I. Number of hydrogen bonds for interfacial residues 
of the wild type (WT), S178A/E180L/Q192L (TM), and 
S178A/E180A/Q192A (TM-A) sequences in the case of a sol- 
vated monomer (Mon), the dimer complex (Dim), and the two 
monomer at the verge of association, separated by 4 A but solvated 
(Dim-Sol). The table shows the average number of hydrogen bonds 
between sidechains of the interfacial residues and (column 3) water 
or (column 4) water and sidechain atoms. The standard error is 
approximately 2 hydrogen bonds. The interfacial residues are T148, 
1150, L151, D152, R154, L172, E175, S178, E180, V181, W184, 
M185, E187, T188, L189, V191, Q192, N193, K199 and K203. 



III. DISCUSSION AND CONCLUSION 

Our computational prediction of mutations that alter the 
CA-C association mechanism could be examined by solu- 
tion measurement of binding affinities for mutated CA-C pro- 
teins or the assembly behavior of mutated full-length pro- 
tein. The existence of dewetting implies a stronger hy- 
drophobic contribution to the association free energy as com- 
pared to cases without dewetting. Therefore we expect that 
S178A/E180L/Q192L (TM) or S178A/E80L/E187L/Q192L 
(QM) should have substantially smaller values of K& than 
the wild-type CA-C. Notably, comparison of water behav- 
ior for the double mutant S178A/E180L and the mutants 
S178A/E180L/Q192L (TM) or S178A/E180L/E187L/Q192L 
(QM) (Fig. 4) demonstrates cooperative effects on water be- 
havior of mutations which are separated by more than 10 A. 
This cooperativity arises because the protein interface is al- 
ready situated at the edge of a dewetting transition. Such a 
result could be surprising, given that effects of mutations, as 
measured by the resulting change in an association constant, 
are usually additive over distances of 6 A or more [53]. 

Such a comparison is complicated by the fact that the mu- 
tations will increase the binding affinity by alleviating some 
repulsions between charged groups in the dimer. E.g., the sin- 
gle point mutations S178A, E180A, E187A, and Q192A all 
led to increase in the dimerization affinity, likely by eliminat- 
ing electrostatic repulsions in the dimer [20, 49]. This compli- 
cation could be avoided by comparing dissociation constants 
for the triple or quadruple mutants with the analogous alanine 
scanning mutants, TM vs. S178A/E180A/Q192A (TM-A) or 
QM vs. S178A/E180A/E187A/Q192A(QM-A). Since repul- 
sions between charged groups are eliminated in both cases, 



we anticipate that the size and shape preserving mutants TM 
and QM will have increased dimerization affinities in compar- 
ison to TM-A and QM-A. Interestingly, the binding affinity 
has already been measured for QM-A and the results demon- 
strated long-range nonadditive effects of the mutations, with a 
smaller increase in dimerization affinity than for Q 192 A alone 
[49]. 

Notably, the amino acids which we have found to most 
strongly influence dewetting, S178, E180, Q192, are highly 
conserved among HIV and SIV variants [54] despite that fact 
they are not necessary for stereospecific binding (see Methods 
and Ref. [49]). Experimental and theoretical investigations 
have shown that capsid assembly reactions become trapped 
when protein-protein interactions are too strong [1, 18, 55- 
61]. It is possible that the CA-C interface has evolved to avoid 
a dewetting transition in order to maintain the relatively weak 
interactions required for successful assembly. 

Since the CA-C dimerization interface is typical of protein 
binding surfaces, our results raise the possibility that many 
proteins sit near a dewetting transition. This scenario is con- 
sistent with the idea that biological systems tend to position 
themselves near phase transitions to enable sensitive regu- 
lation [62]. It can be tested by applying the computational 
methodologies described here to other proteins to identify mu- 
tations which bring about or eliminate dewetting during as- 
sociation, and then experimentally investigating the effects of 
these mutations on binding affinities. Considering the require- 
ment for weak binding interactions in biological assembly re- 
actions and the prevalence of assembly in biological systems, 
it is of great importance to extend our understanding of the 
relationship between sequence, structure, and association be- 
havior to include effects of the molecularity of water, as we 
have done here for CA-C. 



METHODS 

All simulations were performed using GROMACS 4.0 
[63], modified to enable importance sampling of the distri- 
bution of numbers of water molecules within arbitrarily de- 
fined regions with the INDUS algorithm [25, 37]. We used 
the OPLS-AA force field [64] to represent protein atoms and 
the TIP3P model [65] to represent water molecules. To check 
that the choice of force field did not affect the results, ad- 
ditional simulations were performed on the using the Amber 
99sb force field [66] and SPC/E waters [67]. As described be- 
low the results were qualitatively unchanged by the different 
water model and force field. 

Our simulations are based on the crystal structure of CA- 
C, PDB code 1A43 [19], which corresponds very closely to 
the structure of two CA-C domains located at hexameric and 
pentameric interfaces within the mature HIV capsid crystal 
structure [42]. We prepared a series of initial states in each 
of which the two monomers were separated by a different dis- 
tance D along the axis perpendicular to the dimer interface. 
The protein hydrogen atoms were converted into virtual sites 
[68] to enable a larger timestep. The resulting structure was 
then solvated in a cuboidal box of water molecules of suffi- 
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FIG. 6. Intial set up for water density distribution of HIV capsid protein. The figure shows a cross-section view of the system in a plane 
orthogonal to the dimerization interface. As shown, the proteins are placed along the diagonal axis of the water box. The region to the right of 
the water box shows the location of the vapor slab discussed in the text, and the gray bar shows the location of the extra (peripheral) INDUS 
biasing potential used to ensure that the vapor slab remains at the periphery of the box. The primary INDUS biasing potential applied to the 
dimer interface region is not shown. 



cient dimensions to ensure that protein atoms were separated 
from box edges by at least 10 A. This protocol resulted in 
boxes with 7000 - 8000 water molecules, depending on the 
separation distance D. Sodium and chloride ions were then 
added to this system to create a bulk salt concentration of 100 
mM with additional ions added to achieve charge neutrality. 
As shown in Fig. 6, the box was then extended by 14 A in 
the direction perpendicular to the dimer interface to create a 
vapor slab in order to enable density fluctuations. The posi- 
tion of the vapor slab was held constant at the edge of the box 
throughout all simulations by applying an extra (peripheral) 
INDUS potential to bias the center 2 A of the slab toward zero 
waters. The system then undergoes phase separation, with a 
region of water and a vapor slab at the periphery. Note that 
this INDUS potential is in addition to, and separate from, the 
potential that is applied to bias the number of waters N within 
the dimer interfacial region toward a specified number. The 
INDUS potential applied within the vapor slab region has no 
direct effect on N or its distribution P V (N). The presence of 
the vapor slab enables volume fluctuations of the water region 
and maintains the system at a pressure equal to the water va- 
por pressure. The difference between the water vapor pressure 
and atmospheric pressure has a negligible effect on P V (N) 
since the PV contribution to the free energy of the interfa- 
cial region is negligible compared to surface tension contri- 
butions for the volumes considered. The distributions P v (N) 
obtained using this method were shown to be equivalent to 
those obtained from constant pressure simulations using the 



Langevin piston method in Ref. [37]. This method of en- 
abling the volume fluctuations was chosen because it does not 
require calculating the virial contribution from the umbrella 
biasing forces and because the dynamics of the volume fluc- 
tuations may be more realistic than those realized by a more 
traditional constant pressure algorithm. Simulations were also 
performed with mutated residues; all mutant structures were 
built from the wild type by replacing the side chains of the 
mutated residues using VMD module. 

Each system thus generated was minimized when the max- 
imum force is smaller than 500 kJ mol -1 nm" 1 and heated to 
T = 300 K. Subsequent molecular dynamics were performed 
with the backbone atoms of the protein constrained in order to 
maintain constant relative orientations of the two monomers 
during the calculation of P V (N). Molecular dynamics runs 
were performed with NVT ensemble and a time step of 4fs. 
The temperature was kept at 300 K using the velocity rescal- 
ing modification to the Berendsen thermostat [69]. The pro- 
tein and solvent were each coupled to separate thermostats 
with time constants of 0.1 ps. The Settle algorithm [70] was 
used to constrain H-H and O-H distances in water molecules, 
and all other bond lengths were constrained using LINCS 
[71]. Electrostatic interactions were calculated using the 
particle-mesh Ewald (PME) algorithm [72]. Van der Waals 
interactions were switched at 10 A and cutoff at 12 A. The 
non-bonded neighbor list is updated every 20 fs. Each system 
was integrated for 10 ns ns to ensure equilibration. 

After equilibration, the indirect umbrella sampling (IN- 
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DUS) algorithm [37] was used to calculate distributions of 
numbers of waters within a 'channel' or region between the 
two dimers for each system. An appropriate definition of the 
channel is essential to monitor the existence of a dewetting 
transition; inclusion of additional subregions for which dewet- 
ting is unfavorable can obfuscate the results. An initial region 
was generated by determining the convex hull [73] defined 
by the positions of 12 a-C residues, 1150, L 151, L 172, V181, 
W184, M185, E187, T188, L189, V191, Q192 and N193 at 
the dimer interface from each monomer. A preliminary IN- 
DUS run was then performed to bias the channel toward zero 
waters by applying a potential of J/bias = kn^ with n w the 
number of waters in the channel and k = 1 kJ nm~ 2 . The re- 
gion was then subdivided into cells of size 1 A, and cells with 
an average water occupation greater than 0.01 were elimi- 
nated from the channel region. We find that a channel with an 
hourglass shape naturally arises from this procedure (Fig. 3A). 
We performed the procedure independently for each separa- 
tion D and set of mutations. Some mutations lead to some- 
what larger channel volumes by this procedure; to facilitate 
comparison of P v (N) distributions calculated for defense sets 
of mutations, the threshold water density within a cell was ad- 
justed to achieve approximately the same channel volume as 
found for the wild type protein. Once the channel was de- 
fined, umbrella sampling was performed in a series of win- 
dows, where window centers start from to 15, with interval 
of 3. For each window, the system was run for 2 ns to al- 
low equilibration under the bias potential followed by 10 ns 
of data collection. Finally, the unbiased probability distribu- 
tion was determined using the weighted histogram analysis 
method (WHAM) [74]. 

Additional simulations were performed to calculate P v (N) 
near isolated monomers. For these runs a single monomer 
was extracted from the crystal structure, solvated, minimized, 
and equilibrated as described for the dimer simulations. To 
facilitate comparison with the dimer simulations, for the wild 
type monomer and each set of mutations studied, the channel 
region was defined as the nearest half of the channel deter- 
mined for the wild type dimer simulation with a separation of 
D = 4 A. The distribution P V (N) within this region was then 
calculated as described above for the dimer simulations. 

Finally, to assess the effect of mutations on the struc- 
ture of the dimer, we simulated the Q192L/E180L/S178A 



dimer with a separation D = for 100 ns. We measured 
an RMSD=2.6 A between the positions of backbone heavy 
atoms and those of the wild type crystal structure (for com- 
parison the RMSD=2.0 A for the wild type structure). While 
this is not long enough to definitively determine the sta- 
bility of the complex, it is consistent with the experimen- 
tal observation that mutation of the same residues to ALA 
(Q192A/E180A/S178A) does not impair stability of the com- 
plex. 

P V (N) calculation in Amber force field with SPC/E wa- 
ters. Additional simulations were performed on dimerization 
of the wild type CA-C protein and the mutants TM and QM, 
using the Amber 99sb force field and SPC/E waters. The be- 
havior of the wild type protein was unchanged with respect to 
the results described in the main text. As shown in Fig. 7A, the 
results are qualitatively unchanged for TM and QM proteins, 
but the critical dewetting distance is shifted to slightly smaller 
values, approximately 3.7 A for TM and 4.0 A for QM. It is 
not surprising that the critical distance changes slightly upon 
variations in the water model, since the critical distance is con- 
trolled by a balance of surface tension effects and the chemical 
potential difference between liquid water and vapor within the 
interfacial region [30]. 

Monomer conformational transitions. Although solution 
measurement indicates that the CA-C monomer can take an 
alternative partially unfolded conformation [47, 50, 75, 76], 
thermodynamic and kinetic measurements of the associa- 
tion process are most consistent with a mechanism in which 
monomers adopt the completely folded conformation prior to 
association [75]. We did not observe partial unfolding during 
our simulations of isolated monomers or monomers at large 
separation, indicating that such a conformational transition 
occurs on timescales longer than 100 ns. We thus simulate 
the approach of two fully folded CA-C monomers. 
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